Overview

Dataset statistics

Number of variables17
Number of observations20279
Missing cells12956
Missing cells (%)3.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.3 MiB
Average record size in memory480.6 B

Variable types

NUM11
CAT5
BOOL1

Reproduction

Analysis started2020-05-20 15:47:39.520144
Analysis finished2020-05-20 15:48:33.955742
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
gift_id has a high cardinality: 20279 distinct values High cardinality
instock_date has a high cardinality: 17147 distinct values High cardinality
stock_update_date has a high cardinality: 14794 distinct values High cardinality
uk_date1 has a high cardinality: 16384 distinct values High cardinality
uk_date2 has a high cardinality: 7210 distinct values High cardinality
volumes has 12956 (63.9%) missing values Missing
instock_date only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
stock_update_date only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
uk_date1 only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
uk_date2 only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
gift_cluster has 3568 (17.6%) zeros Zeros

Variables

gift_id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count20279
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size158.6 KiB
GF_15330
 
1
GF_14569
 
1
GF_19303
 
1
GF_20189
 
1
GF_11946
 
1
Other values (20274)
20274
ValueCountFrequency (%) 
GF_15330 1 < 0.1%
 
GF_14569 1 < 0.1%
 
GF_19303 1 < 0.1%
 
GF_20189 1 < 0.1%
 
GF_11946 1 < 0.1%
 
GF_19220 1 < 0.1%
 
GF_13718 1 < 0.1%
 
GF_24271 1 < 0.1%
 
GF_10883 1 < 0.1%
 
GF_18968 1 < 0.1%
 
Other values (20269) 20269 > 99.9%
 

Length

Max length8
Mean length7.559593668
Min length4
ValueCountFrequency (%) 
Decimal_Number 10 76.9%
 
Uppercase_Letter 2 15.4%
 
Connector_Punctuation 1 7.7%
 
ValueCountFrequency (%) 
Common 11 84.6%
 
Latin 2 15.4%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

gift_type
Real number (ℝ≥0)

Distinct count1237
Unique (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean739.5546625
Minimum1
Maximum1360
Zeros0
Zeros (%)0.0%
Memory size158.6 KiB

Quantile statistics

Minimum1
5-th percentile66
Q1403
median825
Q31032
95-th percentile1271
Maximum1360
Range1359
Interquartile range (IQR)629

Descriptive statistics

Standard deviation389.2169888
Coefficient of variation (CV)0.5262856264
Kurtosis-1.148934749
Mean739.5546625
Median Absolute Deviation (MAD)339.8746067
Skewness-0.2776329907
Sum14997429
Variance151489.8643
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.0000e+00 4.0000e+00 9.5000e+00 1.1500e+01 1.7500e+01 ... 1.3135e+03 1.3145e+03 1.3155e+03 1.3375e+03 1.3600e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1008 988 4.9%
 
1266 982 4.8%
 
660 889 4.4%
 
992 550 2.7%
 
896 493 2.4%
 
908 376 1.9%
 
1173 355 1.8%
 
415 347 1.7%
 
1111 318 1.6%
 
899 297 1.5%
 
Other values (1227) 14684 72.4%
 
ValueCountFrequency (%) 
1 5 < 0.1%
 
2 3 < 0.1%
 
3 19 0.1%
 
5 1 < 0.1%
 
6 1 < 0.1%
 
ValueCountFrequency (%) 
1360 1 < 0.1%
 
1357 2 < 0.1%
 
1356 2 < 0.1%
 
1355 5 < 0.1%
 
1354 6 < 0.1%
 

gift_category
Real number (ℝ≥0)

Distinct count852
Unique (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean394.1715568
Minimum1
Maximum893
Zeros0
Zeros (%)0.0%
Memory size158.6 KiB

Quantile statistics

Minimum1
5-th percentile38
Q1188
median433
Q3534
95-th percentile815
Maximum893
Range892
Interquartile range (IQR)346

Descriptive statistics

Standard deviation235.0777687
Coefficient of variation (CV)0.5963844035
Kurtosis-0.9591278385
Mean394.1715568
Median Absolute Deviation (MAD)201.8403781
Skewness0.1777879101
Sum7993405
Variance55261.55736
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 9. 10.5 11.5 12.5 ... 876.5 879.5 884.5 885.5 893. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
188 2723 13.4%
 
534 2064 10.2%
 
433 1257 6.2%
 
38 643 3.2%
 
822 490 2.4%
 
305 400 2.0%
 
610 304 1.5%
 
450 284 1.4%
 
282 279 1.4%
 
12 275 1.4%
 
Other values (842) 11560 57.0%
 
ValueCountFrequency (%) 
1 4 < 0.1%
 
3 1 < 0.1%
 
4 4 < 0.1%
 
6 4 < 0.1%
 
7 3 < 0.1%
 
ValueCountFrequency (%) 
893 3 < 0.1%
 
892 2 < 0.1%
 
891 1 < 0.1%
 
890 5 < 0.1%
 
889 7 < 0.1%
 

gift_cluster
Real number (ℝ≥0)

ZEROS
Distinct count5414
Unique (%)26.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3303.358548
Minimum0
Maximum7567
Zeros3568
Zeros (%)17.6%
Memory size158.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1587
median3231
Q35787
95-th percentile7213
Maximum7567
Range7567
Interquartile range (IQR)5200

Descriptive statistics

Standard deviation2541.082549
Coefficient of variation (CV)0.7692421249
Kurtosis-1.350521372
Mean3303.358548
Median Absolute Deviation (MAD)2210.610126
Skewness0.1211027051
Sum66988808
Variance6457100.52
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0000e+00 5.0000e-01 4.6500e+01 7.1500e+01 7.4500e+01 ... 7.4835e+03 7.5135e+03 7.5385e+03 7.5665e+03 7.5670e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3568 17.6%
 
6489 60 0.3%
 
3371 59 0.3%
 
260 57 0.3%
 
6490 53 0.3%
 
6936 53 0.3%
 
6935 53 0.3%
 
2693 53 0.3%
 
2694 52 0.3%
 
2695 51 0.3%
 
Other values (5404) 16220 80.0%
 
ValueCountFrequency (%) 
0 3568 17.6%
 
1 1 < 0.1%
 
8 1 < 0.1%
 
9 1 < 0.1%
 
10 1 < 0.1%
 
ValueCountFrequency (%) 
7567 8 < 0.1%
 
7566 1 < 0.1%
 
7565 1 < 0.1%
 
7564 4 < 0.1%
 
7563 2 < 0.1%
 

instock_date
Categorical

HIGH CARDINALITY
TYPE DATE
UNIFORM
Distinct count17147
Unique (%)84.6%
Missing0
Missing (%)0.0%
Memory size158.6 KiB
2015-09-25 13:22:53.000
 
7
2015-09-22 14:30:05.000
 
7
2015-05-06 17:20:59.000
 
7
2015-05-05 14:09:22.000
 
7
2016-03-29 21:55:43.000
 
7
Other values (17142)
20244
ValueCountFrequency (%) 
2015-09-25 13:22:53.000 7 < 0.1%
 
2015-09-22 14:30:05.000 7 < 0.1%
 
2015-05-06 17:20:59.000 7 < 0.1%
 
2015-05-05 14:09:22.000 7 < 0.1%
 
2016-03-29 21:55:43.000 7 < 0.1%
 
2015-09-22 15:24:59.000 6 < 0.1%
 
2015-08-21 21:36:19.000 6 < 0.1%
 
2016-03-29 22:54:54.000 6 < 0.1%
 
2015-08-22 19:36:19.000 6 < 0.1%
 
2015-05-09 13:10:20.000 6 < 0.1%
 
Other values (17137) 20214 99.7%
 

Length

Max length23
Mean length23
Min length23
ValueCountFrequency (%) 
Decimal_Number 10 71.4%
 
Other_Punctuation 2 14.3%
 
Space_Separator 1 7.1%
 
Dash_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

stock_update_date
Categorical

HIGH CARDINALITY
TYPE DATE
UNIFORM
Distinct count14794
Unique (%)73.0%
Missing0
Missing (%)0.0%
Memory size158.6 KiB
2017-03-12 18:39:44.000
 
17
2017-03-12 20:39:44.000
 
14
2017-03-11 17:39:44.000
 
14
2017-04-02 15:42:25.000
 
13
2017-03-12 16:39:44.000
 
13
Other values (14789)
20208
ValueCountFrequency (%) 
2017-03-12 18:39:44.000 17 0.1%
 
2017-03-12 20:39:44.000 14 0.1%
 
2017-03-11 17:39:44.000 14 0.1%
 
2017-04-02 15:42:25.000 13 0.1%
 
2017-03-12 16:39:44.000 13 0.1%
 
2017-03-12 19:39:44.000 12 0.1%
 
2017-04-03 15:42:26.000 11 0.1%
 
2017-03-07 20:37:24.000 11 0.1%
 
2017-03-12 18:41:37.000 11 0.1%
 
2017-03-11 16:39:44.000 11 0.1%
 
Other values (14784) 20152 99.4%
 

Length

Max length23
Mean length23
Min length23
ValueCountFrequency (%) 
Decimal_Number 10 71.4%
 
Other_Punctuation 2 14.3%
 
Space_Separator 1 7.1%
 
Dash_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

lsg_1
Real number (ℝ≥0)

Distinct count7464
Unique (%)36.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5314.595345
Minimum0
Maximum9979
Zeros1
Zeros (%)< 0.1%
Memory size158.6 KiB

Quantile statistics

Minimum0
5-th percentile676.9
Q13311
median5520
Q37535
95-th percentile9432
Maximum9979
Range9979
Interquartile range (IQR)4224

Descriptive statistics

Standard deviation2703.317282
Coefficient of variation (CV)0.5086590994
Kurtosis-1.00388407
Mean5314.595345
Median Absolute Deviation (MAD)2283.403914
Skewness-0.1493298783
Sum107774679
Variance7307924.325
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 968. 1017.5 1114.5 1714.5 ... 9620. 9622.5 9656. 9675.5 9979. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
8604 79 0.4%
 
9432 60 0.3%
 
6977 59 0.3%
 
5543 57 0.3%
 
7529 54 0.3%
 
9433 53 0.3%
 
6980 53 0.3%
 
5574 53 0.3%
 
6988 53 0.3%
 
5577 52 0.3%
 
Other values (7454) 19706 97.2%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 4 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 7 < 0.1%
 
ValueCountFrequency (%) 
9979 1 < 0.1%
 
9978 1 < 0.1%
 
9977 2 < 0.1%
 
9976 2 < 0.1%
 
9975 2 < 0.1%
 

lsg_2
Real number (ℝ≥0)

Distinct count5310
Unique (%)26.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4187.653928
Minimum0
Maximum7604
Zeros1
Zeros (%)< 0.1%
Memory size158.6 KiB

Quantile statistics

Minimum0
5-th percentile576
Q12251
median4246
Q36504.5
95-th percentile7205
Maximum7604
Range7604
Interquartile range (IQR)4253.5

Descriptive statistics

Standard deviation2274.875522
Coefficient of variation (CV)0.5432338873
Kurtosis-1.354571927
Mean4187.653928
Median Absolute Deviation (MAD)2028.057086
Skewness-0.1343378906
Sum84921434
Variance5175058.64
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 30.5 156. 157.5 188.5 ... 7392.5 7482.5 7484.5 7486.5 7604. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
6883 2037 10.0%
 
2250 223 1.1%
 
4404 130 0.6%
 
915 79 0.4%
 
7186 60 0.3%
 
7103 59 0.3%
 
6408 57 0.3%
 
2872 54 0.3%
 
7172 53 0.3%
 
7205 53 0.3%
 
Other values (5300) 17474 86.2%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 1 < 0.1%
 
3 3 < 0.1%
 
6 2 < 0.1%
 
8 1 < 0.1%
 
ValueCountFrequency (%) 
7604 2 < 0.1%
 
7603 1 < 0.1%
 
7602 1 < 0.1%
 
7601 1 < 0.1%
 
7600 3 < 0.1%
 

lsg_3
Real number (ℝ≥0)

Distinct count7032
Unique (%)34.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4866.94551
Minimum0
Maximum9493
Zeros1
Zeros (%)< 0.1%
Memory size158.6 KiB

Quantile statistics

Minimum0
5-th percentile653
Q12548
median4839
Q37387
95-th percentile8879
Maximum9493
Range9493
Interquartile range (IQR)4839

Descriptive statistics

Standard deviation2713.856392
Coefficient of variation (CV)0.5576097753
Kurtosis-1.252581449
Mean4866.94551
Median Absolute Deviation (MAD)2375.241978
Skewness-0.02108544396
Sum98696788
Variance7365016.517
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 395.5 396.5 415. 436.5 ... 9427.5 9428.5 9439.5 9440.5 9493. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
7995 130 0.6%
 
7114 79 0.4%
 
8539 60 0.3%
 
8528 59 0.3%
 
3541 57 0.3%
 
5777 54 0.3%
 
3540 53 0.3%
 
8541 53 0.3%
 
8545 53 0.3%
 
8540 53 0.3%
 
Other values (7022) 19628 96.8%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 3 < 0.1%
 
ValueCountFrequency (%) 
9493 2 < 0.1%
 
9492 1 < 0.1%
 
9491 2 < 0.1%
 
9490 2 < 0.1%
 
9489 2 < 0.1%
 

lsg_4
Real number (ℝ≥0)

Distinct count1269
Unique (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1679.152226
Minimum3
Maximum2056
Zeros0
Zeros (%)0.0%
Memory size158.6 KiB

Quantile statistics

Minimum3
5-th percentile406
Q11801
median1912
Q31912
95-th percentile1912
Maximum2056
Range2053
Interquartile range (IQR)111

Descriptive statistics

Standard deviation485.699119
Coefficient of variation (CV)0.2892525832
Kurtosis2.474228067
Mean1679.152226
Median Absolute Deviation (MAD)349.439593
Skewness-2.000614206
Sum34051528
Variance235903.6342
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 3. 4. 109. 115.5 193. ... 1973.5 1990. 2031. 2034. 2056. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1912 13076 64.5%
 
406 533 2.6%
 
1873 233 1.1%
 
1887 132 0.7%
 
1845 125 0.6%
 
1816 120 0.6%
 
1786 118 0.6%
 
1859 110 0.5%
 
1705 107 0.5%
 
591 89 0.4%
 
Other values (1259) 5636 27.8%
 
ValueCountFrequency (%) 
3 8 < 0.1%
 
5 1 < 0.1%
 
8 1 < 0.1%
 
11 1 < 0.1%
 
13 1 < 0.1%
 
ValueCountFrequency (%) 
2056 1 < 0.1%
 
2055 1 < 0.1%
 
2054 1 < 0.1%
 
2053 1 < 0.1%
 
2052 1 < 0.1%
 

lsg_5
Real number (ℝ≥0)

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.652694906
Minimum0
Maximum10
Zeros3
Zeros (%)< 0.1%
Memory size158.6 KiB

Quantile statistics

Minimum0
5-th percentile1
Q19
median9
Q310
95-th percentile10
Maximum10
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.34938795
Coefficient of variation (CV)0.2715209511
Kurtosis5.437682115
Mean8.652694906
Median Absolute Deviation (MAD)1.366414041
Skewness-2.6002019
Sum175468
Variance5.519623741
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1.5 2.5 3.5 4.5 ... 6.5 7.5 8.5 9.5 10. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
9 10390 51.2%
 
10 7605 37.5%
 
1 1369 6.8%
 
3 404 2.0%
 
8 247 1.2%
 
6 127 0.6%
 
4 105 0.5%
 
7 15 0.1%
 
5 12 0.1%
 
0 3 < 0.1%
 
ValueCountFrequency (%) 
0 3 < 0.1%
 
1 1369 6.8%
 
2 2 < 0.1%
 
3 404 2.0%
 
4 105 0.5%
 
ValueCountFrequency (%) 
10 7605 37.5%
 
9 10390 51.2%
 
8 247 1.2%
 
7 15 0.1%
 
6 127 0.6%
 

lsg_6
Real number (ℝ≥0)

Distinct count1583
Unique (%)7.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1265.898171
Minimum0
Maximum2065
Zeros1
Zeros (%)< 0.1%
Memory size158.6 KiB

Quantile statistics

Minimum0
5-th percentile150
Q1577.5
median1616
Q31899
95-th percentile1913
Maximum2065
Range2065
Interquartile range (IQR)1321.5

Descriptive statistics

Standard deviation697.8384954
Coefficient of variation (CV)0.5512595813
Kurtosis-1.379955601
Mean1265.898171
Median Absolute Deviation (MAD)635.0477428
Skewness-0.5218620285
Sum25671149
Variance486978.5657
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 34.5 36.5 62.5 63.5 ... 2021.5 2022.5 2037.5 2038.5 2065. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1899 7276 35.9%
 
150 2065 10.2%
 
1913 566 2.8%
 
921 461 2.3%
 
746 337 1.7%
 
1778 302 1.5%
 
1689 243 1.2%
 
1266 192 0.9%
 
1616 188 0.9%
 
63 183 0.9%
 
Other values (1573) 8466 41.7%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 2 < 0.1%
 
2 2 < 0.1%
 
3 1 < 0.1%
 
4 2 < 0.1%
 
ValueCountFrequency (%) 
2065 1 < 0.1%
 
2064 3 < 0.1%
 
2063 1 < 0.1%
 
2062 1 < 0.1%
 
2059 1 < 0.1%
 

uk_date1
Categorical

HIGH CARDINALITY
TYPE DATE
UNIFORM
Distinct count16384
Unique (%)80.8%
Missing0
Missing (%)0.0%
Memory size158.6 KiB
2017-03-31 12:40:24.000
 
17
2017-03-30 12:40:24.000
 
14
2017-04-02 14:40:24.000
 
14
2017-03-29 12:40:24.000
 
13
2017-04-01 16:40:28.000
 
13
Other values (16379)
20208
ValueCountFrequency (%) 
2017-03-31 12:40:24.000 17 0.1%
 
2017-03-30 12:40:24.000 14 0.1%
 
2017-04-02 14:40:24.000 14 0.1%
 
2017-03-29 12:40:24.000 13 0.1%
 
2017-04-01 16:40:28.000 13 0.1%
 
2017-03-29 12:42:20.000 13 0.1%
 
2017-04-02 16:40:25.000 12 0.1%
 
2017-03-29 16:40:24.000 12 0.1%
 
2017-03-31 14:40:24.000 12 0.1%
 
2017-04-02 12:40:28.000 12 0.1%
 
Other values (16374) 20147 99.3%
 

Length

Max length23
Mean length23
Min length23
ValueCountFrequency (%) 
Decimal_Number 10 71.4%
 
Other_Punctuation 2 14.3%
 
Space_Separator 1 7.1%
 
Dash_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

uk_date2
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count7210
Unique (%)35.6%
Missing0
Missing (%)0.0%
Memory size158.6 KiB
2016-11-04 02:00:00.000
 
97
2016-11-05 02:00:00.000
 
96
2016-11-05 05:00:00.000
 
89
2016-11-04 03:00:00.000
 
88
2016-11-05 04:00:00.000
 
87
Other values (7205)
19822
ValueCountFrequency (%) 
2016-11-04 02:00:00.000 97 0.5%
 
2016-11-05 02:00:00.000 96 0.5%
 
2016-11-05 05:00:00.000 89 0.4%
 
2016-11-04 03:00:00.000 88 0.4%
 
2016-11-05 04:00:00.000 87 0.4%
 
2016-11-04 05:00:00.000 82 0.4%
 
2016-11-04 01:00:00.000 82 0.4%
 
2016-11-04 04:00:00.000 81 0.4%
 
2016-11-05 01:00:00.000 80 0.4%
 
2016-11-03 04:00:00.000 79 0.4%
 
Other values (7200) 19418 95.8%
 

Length

Max length23
Mean length23
Min length23
ValueCountFrequency (%) 
Decimal_Number 10 71.4%
 
Other_Punctuation 2 14.3%
 
Space_Separator 1 7.1%
 
Dash_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size158.6 KiB
0
15622
1
4657
ValueCountFrequency (%) 
0 15622 77.0%
 
1 4657 23.0%
 

volumes
Real number (ℝ≥0)

MISSING
Distinct count25
Unique (%)0.3%
Missing12956
Missing (%)63.9%
Infinite0
Infinite (%)0.0%
Mean15.51536256
Minimum5
Maximum29
Zeros0
Zeros (%)0.0%
Memory size158.6 KiB

Quantile statistics

Minimum5
5-th percentile6
Q19
median13
Q324
95-th percentile28
Maximum29
Range24
Interquartile range (IQR)15

Descriptive statistics

Standard deviation7.579669015
Coefficient of variation (CV)0.4885267094
Kurtosis-1.386066273
Mean15.51536256
Median Absolute Deviation (MAD)6.841471509
Skewness0.3927800415
Sum113619
Variance57.45138237
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
10 730 3.6%
 
9 705 3.5%
 
8 601 3.0%
 
7 542 2.7%
 
6 413 2.0%
 
25 410 2.0%
 
27 383 1.9%
 
26 366 1.8%
 
11 358 1.8%
 
28 299 1.5%
 
Other values (15) 2516 12.4%
 
(Missing) 12956 63.9%
 
ValueCountFrequency (%) 
5 18 0.1%
 
6 413 2.0%
 
7 542 2.7%
 
8 601 3.0%
 
9 705 3.5%
 
ValueCountFrequency (%) 
29 134 0.7%
 
28 299 1.5%
 
27 383 1.9%
 
26 366 1.8%
 
25 410 2.0%
 

price
Real number (ℝ≥0)

Distinct count12959
Unique (%)63.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean143.4044115
Minimum0.01
Maximum7010.27
Zeros0
Zeros (%)0.0%
Memory size158.6 KiB

Quantile statistics

Minimum0.01
5-th percentile17.3
Q145.645
median75.6
Q3126.845
95-th percentile522.848
Maximum7010.27
Range7010.26
Interquartile range (IQR)81.2

Descriptive statistics

Standard deviation267.2811593
Coefficient of variation (CV)1.863828014
Kurtosis84.11998212
Mean143.4044115
Median Absolute Deviation (MAD)122.8924222
Skewness6.732911599
Sum2908098.06
Variance71439.21814
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.000000e-02 1.255000e+00 1.505000e+00 2.505000e+00 5.055000e+00 ... 1.222500e+03 1.541600e+03 1.930330e+03 2.168745e+03 7.010270e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
69.59 8 < 0.1%
 
46.25 8 < 0.1%
 
51 8 < 0.1%
 
50.49 8 < 0.1%
 
59.01 8 < 0.1%
 
32.74 7 < 0.1%
 
43.01 7 < 0.1%
 
45.92 7 < 0.1%
 
41.81 7 < 0.1%
 
69.37 7 < 0.1%
 
Other values (12949) 20204 99.6%
 
ValueCountFrequency (%) 
0.01 1 < 0.1%
 
1.25 1 < 0.1%
 
1.26 1 < 0.1%
 
1.28 1 < 0.1%
 
1.3 1 < 0.1%
 
ValueCountFrequency (%) 
7010.27 1 < 0.1%
 
6189.67 1 < 0.1%
 
5619.61 1 < 0.1%
 
5460.49 1 < 0.1%
 
4903.18 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

gift_idgift_typegift_categorygift_clusterinstock_datestock_update_datelsg_1lsg_2lsg_3lsg_4lsg_5lsg_6uk_date1uk_date2is_discountedvolumesprice
0GF_111566153439422014-02-21 05:07:06.0002016-11-09 15:49:51.000337752215041912105542014-02-24 08:07:06.0002014-02-24 07:07:06.0000NaN175.54
1GF_111576153439422014-02-21 06:07:06.0002016-11-11 13:49:51.000337752215041912105542014-02-22 07:07:06.0002014-02-24 06:07:06.0001NaN95.80
2GF_1568958426202014-02-21 09:30:21.0002016-03-24 14:46:18.0005290157932031912915782016-01-26 00:04:45.0002016-03-18 02:00:00.0001NaN107.35
3GF_111556153439422014-02-22 05:07:06.0002016-11-10 16:49:51.000337752215041912105542016-11-07 13:49:51.0002016-11-06 04:00:00.0000NaN172.90
4GF_111586153439422014-02-22 07:07:06.0002016-11-10 13:49:51.00033775221504191295542016-11-07 15:49:51.0002016-11-06 01:00:00.0001NaN77.72
5GF_1568658426202014-02-23 07:30:21.0002016-03-23 15:46:18.0005290157932031912915782014-02-25 07:30:21.0002014-02-24 08:30:21.0001NaN142.84
6GF_1569058426202014-02-23 10:30:21.0002016-03-23 18:46:18.0005290157932031912915782016-01-25 02:04:45.0002016-03-16 02:00:00.0001NaN78.71
7GF_1568558426202014-02-24 10:30:21.0002016-03-24 17:46:18.0005290157932031912915782016-01-22 03:04:45.0002016-03-17 03:00:00.0000NaN166.81
8GF_111596153439422014-02-25 07:07:06.0002016-11-10 14:49:51.000337752215041912105542016-11-07 16:49:51.0002015-04-08 12:48:27.0250NaN72.72
9GF_1568858426202014-02-25 08:30:21.0002016-03-24 16:46:18.0005290157932031912915782016-01-23 03:04:45.0002016-03-11 04:00:00.0001NaN107.12

Last rows

gift_idgift_typegift_categorygift_clusterinstock_datestock_update_datelsg_1lsg_2lsg_3lsg_4lsg_5lsg_6uk_date1uk_date2is_discountedvolumesprice
20269GF_2398541518842562016-11-12 13:17:55.0002016-11-16 12:50:47.00021584524381912101502016-11-16 12:50:47.0002016-10-30 02:00:00.0001NaN37.31
20270GF_9030117353442612016-11-12 13:17:57.0002016-11-15 11:17:57.000600163671019121018992016-11-13 12:17:57.0002016-11-03 03:00:00.0001NaN35.61
20271GF_22761126243354872016-11-12 13:18:10.0002016-11-17 11:18:10.0001491688328701912109212016-11-11 14:18:10.0002016-11-04 05:00:00.0001NaN36.26
20272GF_1377253143359802016-11-12 13:23:35.0002016-11-15 13:23:35.00050565490295419121020382016-11-15 14:23:35.0002016-11-13 13:23:35.0001NaN24.71
20273GF_103521262433422016-11-12 13:26:09.0002016-11-17 11:26:09.00014806883374719121018992016-11-12 14:26:09.0002016-11-10 03:00:00.0001NaN22.02
20274GF_1026910570464482016-11-12 13:46:42.0002016-11-17 10:46:42.0002055688399519121018992016-11-14 14:46:42.0002016-11-11 03:00:00.0000NaN57.68
20275GF_585412205268172016-11-12 13:46:47.0002016-11-18 13:46:47.00083236753670619121018992016-11-13 10:46:47.0002016-10-28 02:00:00.0000NaN122.87
20276GF_5635097058212016-11-12 13:46:57.0002017-01-21 19:30:04.00028264009291219121014512017-01-21 18:30:04.0002017-01-18 01:00:00.0000NaN47.14
20277GF_910768221356202016-11-12 13:47:01.0002016-11-18 11:47:01.0002089688336071912108222016-11-14 12:47:01.0002016-11-13 02:00:00.0000NaN47.68
20278GF_5683106121249872016-11-12 13:48:30.0002016-11-18 12:48:30.0008981814610919121018992016-11-12 10:48:30.0002016-11-10 02:00:00.0000NaN52.81